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Abstract 

The capacity of a fading channel can be substantially increased by feeding back channel state 
information from the receiver to the transmitter. With limited-rate feedback what state information to feed 
back and how to encode it are important open questions. This paper studies power loading in a multicarrier 
system using no more than one bit of feedback per sub-channel. The sub-channels can be correlated and 
full channel state information is assumed at the receiver. First, a simple model with N parallel two-state 
(good/bad) memoryless sub-channels is considered, where the channel state feedback is used to select 
a fixed number of sub-channels to activate. The optimal feedback scheme is the solution to a vector 
quantization problem, and the associated performance for large N is characterized by a rate distortion 
function. As N increases, we show that the loss in forward rate from the asymptotic (rate-distortion) 
value decreases as (log N) /N and \J (log N) /N with optimal variable- and fixed-rate feedback codes, 
respectively. We subsequently extend these results to parallel Rayleigh block fading sub-channels, where 
the feedback designates a set of sub-channels, which are activated with equal power. Rate-distortion 
feedback codes are proposed for designating subsets of (good) sub-channels with Signal-to-Noise Ratios 
(SNRs) that exceed a threshold. The associated performance is compared with that of a simpler lossless 
source coding scheme, which designates groups of good sub-channels, where both the group size and 
threshold are optimized. The rate-distortion codes can provide a significant increase in forward rate at 
low SNRs. 
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distortion theory, Rayleigh fading, vector quantization. 
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I. Introduction 

Multicarrier transmission techniques, including orthogonal frequency-division multiplexing (OFDM), 
provide a convenient way to exploit frequency diversity in multipath fading channels. Given the total 
transmit power, a substantial increase in the channel capacity can be achieved if the power allocation 
across the sub-channels is adapted to channel variations [1]. For example, consider the sum capacity 
of N independent block Rayleigh fading sub-channels with given total power or Signal-to-Noise Ratio 
(SNR). If the power is equally spread over all N sub-channels, the capacity is upper bounded by the 
total SNR regardless of TV, whereas if the power is allocated according to (optimal) water-filling, the 
capacity increases as O(logiV) as N increases [2], [3]. 

The state or quality of the sub-channels is typically measured at the receiver and sent to the transmitter 
through a feedback channel. We refer to this as channel state feedback (CSF). Obviously, optimal power 
allocation requires a prohibitive (infinite) amount of CSF in case of continuous channel state. Even if the 
channel state can be discretized, the number of sub-channels may exceed the total number of feedback 
bits. Hence, what state information to feed back and how to encode the feedback are important questions. 

This work studies the use of limited CSF for maximizing the achievable rate of multicarrier block 
fading channels. It is assumed that the sub-channel states are known or can be measured accurately at 
the receiver. The channel state is encoded using fewer than one bit per sub-channel and then sent to the 
transmitter through a noiseless feedback channel. The transmitter chooses a subset of sub-channels to 
activate based on the feedback. 

The problem of encoding the feedback is essentially a vector quantization (VQ) problem, where the 
channel state is mapped to a given number of bits for later reconstruction. Unlike the usual quantization 
problem, the reconstruction here is to produce a power loading vector for the sub-channels, where the 
distortion metric is the gap between the rate achieved using the feedback and the capacity achieved with 
known channel state at the transmitter. 

Multicarrier power allocation with limited-rate feedback has been previously considered in [2], [4]-[6]. 
In particular, [4] applies the Lloyd algorithm to produce a codebook of power loading vectors, which 
maximizes an objective such as achievable rate. Unfortunately, the size of the codebook in [4], and hence 
the search complexity, grows exponentially with the amount of feedback. Other heuristic schemes with 
one bit feedback per sub-channel have been proposed in [2], [3], [5]. 

This paper investigates the trade-off between the forward data rate and the amount of CSF for block- 
fading multicarrier channels assuming no more than one bit of feedback per sub-channel. Furthermore, 
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in contrast with the lossless feedback source coding schemes analyzed in [2], here we consider the more 
general class of lossy (rate-distortion) source codes. 

We first consider a model with two fading states only. Each sub-channel randomly assumes either a 
good or bad state during a coherence block. For the case of independent two-state sub-channels studied 
in Section |nj the role of the feedback is to direct the transmitter to select as many good sub-channels as 
possible to activate subject to the power constraint. The fundamental trade-off between the feedback rate 
and the sum capacity can be characterized using rate distortion theory in the limit of infinite number of 
sub-channels. For given finite number of sub-channels, we also quantify the gap between rates achievable 
by random coding and the rate distortion bound. Specifically, with variable-rate feedback codes the gap 
decreases as (log N)/N, whereas with fixed-rate codes the gap decreases as \J (log N) /N. 

We also compare the rate-distortion approach with a simple lossless source coding scheme, which 
reports as many good sub-channels as the feedback rate allows. Numerical plots show that good codes 
in the rate distortion sense typically achieve much higher forward rate. The result is then extended to 



the case of correlated two-state sub-channels in Section [TITJ where the sub-channel states are assumed 
to form a Markov chain. Upper and lower bounds on the forward rate are derived as a function of the 
feedback rate. 

With the insights gained from the two-state channel model, we then study the problem of limited CSF 
for Rayleigh fading sub-channels. The fading coefficient, or state of each sub-channel is a Circularly 
Symmetric Complex Gaussian (CSCG) random variable during each coherence block. The case of 



independent sub-channels is studied in Section IV whereas the case of correlated sub-channels is discussed 
in Section [V] The state of each sub-channel is first reduced to a binary variable by comparing its gain 
with a threshold. Similar feedback codes as considered for the two-state channels is used to instruct the 
transmitter which sub-channels to activate, assuming the power is distributed evenly over the activated 
sub-channels. The threshold is selected to maximize the forward rate given a fixed feedback rate. It 
turns out that the trade-off admits a similar characterization as that for two-state sub-channels. Although 
reduction of Rayleigh states to binary states induces loss, the scheme with optimized threshold and a 
moderate amount of feedback performs close to optimal water-filling with channel coefficients known 
at the transmitter. In particular, given a total power constraint, the scheme can achieve a forward rate, 
which has the same order of increase with the number of sub-channels as that of water-filling [2]. 
Two heuristic lossless source schemes for the reduced (two-state) version of the Rayleigh channel are 



also considered for comparison in Section IV In particular, in one of the schemes, taken from [7], the 
sub-channels are divided evenly into groups and the feedback indicates the set of groups in which all 
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sub-channel gains exceed the threshold. A binary state vector, indicating which groups to activate, is then 
compressed using lossless source coding and fed back to the transmitter. The group size and threshold 
can be adjusted to maximize the forward achievable rate, subject to the feedback rate constraint. Such 
grouping, or clustering, of sub-channels to reduce feedback overhead has also been studied in [8] in 
a multiuser setting. Clustering sub-channels to reduce the training overhead and peak-to-average power 
ratio was previously studied in [9]. We characterize the growth in achievable rate with the number of 
sub-channels (for large N) as a function of the amount of feedback (which can also scale with N). 
Numerical examples show that the analytical results are quite accurate for finite-size systems of interest. 
In general, these heuristic schemes achieve a smaller forward rate than for the rate-distortion schemes, 
given a fixed feedback rate. 

II. Independent Two-state Sub-channels 

Consider a bank of N independent and statistically identical block fading sub-channels. During each 
coherence block, each sub-channel randomly takes one of two states, namely "good" and "bad," which 
is known to the receiver. The input is constrained such that up to a fraction p of the sub-channels can be 
activated by the transmitter. Suppose on average the amount of CSF is limited to Rf bits per sub-channel 
per coherence block. The problem is to design a feedback scheme to maximize the forward data rate, 
i.e., to activate as many good sub-channels as possible. 

A. The Fundamental Trade-off via Rate Distortion Theory 

Let the state of sub-channel i be denoted by a Bernoulli random variable^] Si, with the probability 
of being a good state denoted as P {Si = 1} = q. Further, let the power loading variable Si = 1 if 
the i th sub-channel is chosen to be activated and Si = otherwise. Constrained by the feedback and 
transmission power, a feedback scheme specifies a mapping from the set of binary channel state vectors, 
whose Hamming weight is no greater than pN. 

It is easy to see that the feedback scheme is no different than vector quantization, where the channel 
state vector S = [S\ , . . . , Sn] is mapped to NR f bits for recovery at the transmitter. Constrained by the 
feedback rate, the reconstruction may be prone to errors, and the quantization scheme should be designed 
to achieve as few errors (or, as small a distortion) in reconstruction as possible. 

'The following convention will be adopted throughout the paper: A boldface letter represents a vector. An uppercase letter 
represents a random vector or variable (e.g., S, Si), and the corresponding lower case letter represents a specific realization 
(e.g., s, Si). In addition, log(-) denotes natural logarithm. 
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The fundamental trade-off of the forward and feedback rates as N — > oo can be addressed using rate 
distortion theory. The source is a sequence of independent and identically distributed (i.i.d.) Bernoulli(g) 
random variables, Si, S%, .... The distortion measure can be described as djsf(s, s) = jr J2iLi d(si, Si) 
with 

1, if s = 1 and s = 0, 



d(s,s) = 1{ S>S } 



(1) 

0, otherwise. 



The metric accounts for missed opportunities, i.e., good sub-channels which are not activated, but does 
not penalize activation of bad sub-channels, which we refer to as misfires. Further, the power loading 
vector has to satisfy a normalized weight constraint: 

1 N 

w(s) = —J2si<p. (2) 

i=l 

This additional challenge of incorporating the weight constraint on the reconstruction distinguishes the 
problem from the classical rate distortion problem concerning i.i.d. source and single-letter distortion 
measure. Though not obvious, the rate distortion problem admits the following simple single-letter 
characterization. 

Theorem 1: For an i.i.d. Bernoulli(g) source, given the weight constraint on every binary reconstruction, 
w(s) < p, and the distortion measure, d(s, s) = 1{ S> ^}, the rate distortion function is 

R(D) = min 1(5; 5) (3) 

p Ed(S,S)<D 
fi l s ' P{S=l}<p 

where S ~ Bernoulli(g). 

Proof: The achievability part of the theorem is based on Shannon's random coding technique (see 
e.g., [10]). Fix Pg {s and some < St < p, which satisfy Ed{S,S) < D and P{S = 1} < p- S v The 
code book of 2 NRf codewords can be produced randomly with the marginal distribution P§. Further, an 
exponentially small fraction of codewords which violate the weight constraint Q are purged. It can be 
shown that for sufficiently large code length N, the random codebook achieves the distortion D as long 
as the rate R > I (Si; Si) +62- The achievability part is thus proved because 5\ and 62 can be chosen to 
be arbitrarily small. 

Showing the converse requires incorporating the weight constraint ([2]) into the standard technique of 
[10]. Let S represent the reconstruction of the random source vector S. Consider any code of length N 
with rate R which satisfies the distortion and average weight constraints E[«;(S)] < p, which is a weaker 
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than required in the theorem]^] Then, due to the data processing theorem and the independence of Si, 

NR > I(S; S) (4) 

N 

>J2l(Si]§i) (5) 
i=i 

JV 

> min TliS^Si). (6) 



The key task here is to break down the constraints on the distribution of the vector S in ([6]) into constraints 
on the individual random variables. Note that Pg is linear in Pg\ s because the source distribution P& is 
fixed. An important fact is that I(Sf,Si) is convex in the distribution P§.\ s .- Because of the symmetry 
in the indexes i, any optimal distribution Pg\ s that achieves the minimum of ([6]) must be symmetric 
over all indexes i. Otherwise replacing all of them by their average yields smaller mutual information. 
Therefore, due to the symmetry and the additive nature of the constraints, ([6]) implies that the rate R is 
lower bounded by R{D) given in ([3]). ■ 
The minimization over the conditional distribution P§\ s in ((3]) is equivalently over the crossover 
probabilities: 

e o = P£ |5 (0|l) and e x = P S{S (1\0), (7) 

where eo represents the probability of missing a good sub-channel. The mutual information I(S; S) can 
be expressed as the following function of (eo,ei): 

i(eo, ex) ± H 2 (p) - qH 2 (e ) - (1 - q)H 2 (e 1 ), (8) 

where H 2 (-) stands for the binary entropy function. Unless stated otherwise, the units of all information 
metrics are bits. Note that the weight constraint (|2]) should be tight at the minimum because there is no 
penalty on misfires. Thus the optimal crossover probabilities satisfy q(l — eo) + (1 — q)e\ = p. 

Let the capacity of a good sub-channel be C\ and the capacity of a bad sub-channel be Co < C\ . The 
average number of active good sub-channels is Nq{\ — eo). The trade-off between the capacity and the 
feedback rate is characterized by as follows. 

Proposition 1: Given p, q, and the feedback rate Rf bits per sub-channel per coherence block, the 

2 Theorem [l] continues to hold even if the instantaneous input constraint (F2} is replaced by an average constraint, namely 
E[w(S)]< P . 



January 1, 2009 



DRAFT 



7 



maximum achievable forward data rate per sub-channel is 



C = q(l-e* )(C 1 -C )+pC , 



(9) 



where the optimal proportion of missed good sub-channels e^ is the solution to the following optimization 
problem: 



minimize: eo 

subject to: H 2 (p) - qH 2 (e ) - (1 - q)H 2 (e 1 ) < R f , 
q(l - e ) + (1 - q)ei = p, 
< e ,ei < 1. 



(10a) 
(10b) 
(10c) 
(lOd) 



The optimization problem ( fT0[ ) can be easily solved numerically. Clearly, the maximum forward data 
rate increases as the feedback rate increases, but the return vanishes beyond a certain point. The minimum 
feedback rate necessary for achieving the capacity can be determined by tentatively removing the feedback 



constraint ( 10b i. If p > q, one can activate all good sub-channels so that eo = with ei = jz%, whereas 



if p < q, then eo can be as small as 1 — 2 by choosing ei = 0. Substituting these values into ((8), the 
forward rate saturates at the maximum feedback rate, 



R 



(11) 



H 2 (p) - qH 2 (jj , ifp<?, 
H 2 (l-p)-(l-q)H 2 (^), if p>q. 

For any Rf < R, the constraint ( |10b[ ) is tight and eg can be calculated by solving the simultaneous 



equations ( 10b I and ( 10c I, which can be easily reduced to a fixed-point equation. 



B. Performance Bounds for Finite Number of Sub-channels 

Proposition [T] characterizes the asymptotic trade-off as the number of sub-channels N goes to infinity. 
For a practical situation with finite N, the result needs refinement. Note that the solution to Proposition [T] 
provides an upper bound on the forward data rate for finite N, because the converse shown in the proof 
of Theorem [T] holds for all N. In the following, we consider random feedback codes and derive a lower 
bound for the achievable forward data rate with given feedback constraint. 

1) Fixed-Length Constant-Composition Feedback Code: Note that the solution to Proposition [T] upper 
bounds the forward data rate with the average input power constraint E[w(S)] = p and average feedback 
rate of iVi(eo,ei) bits per coherence block. Here we impose two additional constraints without loss of 
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generality: 1) The binary reconstruction vectors have constant composition, that is, w(s) = p for all the 
vectors s in the feedback codebook; and 2) The feedback is at most iVi(eo,ei) bits every coherence 
block. The second restriction implies that there are at most 2 l ^ ' ei ^ N codewords. 

The following proposition gives a lower bound on the achievable forward rate given these additional 
constraints. 

Proposition 2: Let the number of feedback bits per coherence block be Ni(eo,e\), where eo,ei > 



0,^ 0.5 are the solution to (10 1. Then 3N a < oo such that for N > N a , the ergodic capacity achieved 



with the fixed-length constant-composition feedback code is lower bounded as 

g(JV 
Nq 



C fixed > )C (12) 



where C is given by (|9]). 

The proof is given in Appendix [TJ The proposition implies that, for large enough N, the difference 
between the upper bound ([9]) and achievable forward rate per sub-channel approaches zero at the rate 
0(-\/ (logJVj/iV). This further implies that the sum rate across the N sub-channels incurs a loss, which 
increases as 0(^NlogN) compared to NC. The proof basically follows the random coding technique 
of Goblick [11] for analyzing the convergence rate of the rate distortion function for general sources 
and fixed-length block codes. The contribution in this work is to incorporate the additional constant- 
composition constraint and to simplify the analysis by exploiting the binary structure of the source and 
the reconstruction^ 

Although the result in Proposition [2] is stated for large N, more refined lower bound is derived in 
Appendix [I] which holds for any finite N. 

2) Variable-Length Feedback Codes: We note that fixed-length codes can cover a subset of most 
probable channel state vectors, but are unable to adapt to deviations from typical channel conditions. 
In the following, we analyze the performance of variable-length feedback codes. A variable amount of 
feedback is allowed during each coherence block as long as the average number of feedback bits is 
Ni(eo,ei). The instantaneous power constraint is replaced by an average power constraint. 

Proposition 3: Let the average number of feedback bits per coherence block be iVi(eo,ei), where 
eo, ei > are the solution to Proposition [I] and eo, ei / \. Then 3 N± < oo such that for every N > N±, 
the ergodic forward rate achieved with a variable-length feedback code, under an average input power 

3 We avoid the use of complicated partition functions in [11] by using a Chernoff bound to evaluate the tail probability 
distributions. 
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Fig. 1 . Forward rate versus feedback rate for different input constraints corresponding to VQ and sub-optimal feedback schemes 
using lossless source coding with channel state reduction (curves labeled with "LSC")- The sub-channels are assumed to be 
independent. Other parameters are q = 0.3, C\ = 3 and Co = 0. Also shown is the lower bound on forward rate corresponding 
to N = 500 and a variable-length feedback code (dotted lines). 



constraint E[w(S)] = p, is lower bounded as 

° wr ~ i 1 ~ qiH^eoH H^)] 10 ^) C (13) 

where C is given in (|9]). 

The proof is provided in Appendix [n] We follow the technique given by Pinkston [12], albeit with 
slight modifications. The main difference lies in incorporating the average input power constraint and 
avoiding the use of partition functions by exploiting the binary structure of the state and power loading 
vectors. 

The proposition says that in this scenario the forward rate converges to the upper bound Q as 
O ((log N)/N). This is a substantial improvement over the 0(W (log N)/N) convergence rate achieved 
by the fixed-length constant composition feedback codes (cf. ([T2])). Although the result in Proposition [3] 
is stated for large N, a more refined lower bound is derived in Appendix [ITJ which holds for any finite 
N. Fig. [T] plots a few instances of this lower bound against the upper bound ([9]) as the feedback rate 
varies. Clearly, the lower bound is fairly close to the optimal forward rate (|9]) and becomes tighter as the 
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feedback rate increases. 

C. Practical CSF Codes 

Until now, we have shown that for moderate to large N, the rate distortion trade-off can be approached 
using random codes. Such a code, however, is not practical. As aforementioned, the Lloyd algorithm can 
be used to design a near-optimal vector quantizer for small number of sub-channels (see [4], [13]— [15] 
and references therein). Such a task becomes infeasible with tens or hundreds of sub-channels, as is the 
case in many applications. 

One practical solution in the case of a large number of sub-channels is to use a graphical code 
similar to a low-density parity-check (LDPC) code. Encoding and decoding of the source (channel state 
vector) are respectively analogous to iterative decoding and encoding of a graphical error-control code. 
The complexity of such a code is in general linear in the number of sub-channels. For a discussion of 
graphical codes for source coding, the reader is referred to [16]— [1 8] . It is more challenging to design 
and implement variable-length codes. 

D. A Sub-optimal Scheme: Lossless Source Coding with Channel State Reduction 

For comparison, we also consider a feedback scheme using simple channel state reduction and lossless 
source coding in lieu of vector quantization (henceforth referred to as the "LSC" scheme for convenience). 
If the feedback rate is greater than the entropy rate of the channel state vector, i.e., Rf > #2(9), then any 
lossless codes such as the Huffman code basically suffice. If the feedback rate is less than the entropy 
rate, we consider a simple scheme which reports a fraction / of good sub-channels, where / is chosen 
such that the entropy rate H2(fq) is basically Rf. On average the transmitter is informed of fqN good 
sub-channels. The forward rate achieved with this option is 



The expression follows directly from the observation that if fewer than fraction p of the sub-channels 
are reported as good (that is, fq < p), the remaining p — fq fraction of the sub-channels are chosen at 
random so that the probability of transmitting on a good sub-channel is given by (q — /«?)/( 1 — fq). 

Note that it might be more efficient for the receiver to inform the transmitter to avoid a subset of 
bad sub-channels than to report a subset of good sub-channels, depending on the parameters. Suppose a 




g(l-/)C 1+ (l- g )C ( 
,(!-/)+(!-,) 







otherwise. 



(14) 
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Fig. 2. The number of misfires and missed opportunities (normalized by the total number of sub-channels N) versus the 
feedback rate for different input constraints. The sub-channels are assumed to be independent. Other parameters are q — 0.3, 
Ci = 3 and C = 0. 



fraction / of the bad sub-channels are reported to the transmitter where /^(/(l — q)) = Rf- The forward 
rate achieved with this option is given by 



C 



f 



gCi+(l-/)(l-g)C„ ifv<a+ ( 1 _f)( 1 _ a ) 

p 9+(i-/)(i-«) ' u P^i^y 1 j iv 1 i) 

qC± + (p — q)Co , otherwise. 



The maximum forward data rate achievable by the LSC scheme is therefore max{C/,C/}. 
E. Numerical Results 

We study the asymptotic performance of the optimal VQ scheme (given by ([9])) and the sub-optimal LSC 
scheme (given by (bfy and (fT?])). Fig. [TJplots the forward rate per sub-channel versus the feedback rate Rf 
for different values of the input power constraint p. The LSC scheme is clearly inferior compared to the 
asymptotic VQ scheme with infinite as well as the variable-length VQ scheme at N = 500 sub-channels. 
At small values of feedback, the VQ scheme gives substantial gains (up to 100%). The asymptotic result 
is quite representative of the performance with a relatively large number of sub-channels (N = 500). As 
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expected, the forward rate increases with the feedback amount and saturates at Rf = H2(q), at which 
point all good sub-channels can be reported at no loss. 

The gain achieved with VQ can be better understood by studying the corresponding numbers of 
missed opportunities and misfires shown in Fig. [2] Intuitively, the larger the values of eo and e±, the more 
"distortion" or "errors" we allow in the rate-distortion feedback code and hence it will require smaller 
amount of feedback. This is clearly reflected in ([8]). Consider the case of p = 0.4 (> q) in Fig. [2j eo = 
and the fraction of misfires (1 — q)e% = 0.1, which implies that we allow enough misfires so that the 
required feedback rate is kept small. In contrast, with p = 0.4 we never report a bad sub-channel as good 
for the LSC scheme, and thereby incur extra feedback overhead. The reverse holds for p = 0.2 (< q), 
where the optimal scheme allows enough missed opportunities (eo is large and e\ is small) so that the 
required feedback rate is again small. Since in this case ei « 0, bad channels are not reported as good. 

III. Correlated Two-State Sub-channels 

In multicarrier systems, the states of the sub-channels are often correlated. Consider the same system as 
in Section [IT] except that the binary (good/bad) channel states of the N sub-channels, Si, 52, ... , Sn form 
a stationary Markov chain. The optimal feedback scheme is nonetheless a vector quantization problem, 
with its asymptotic performance characterized by the following rate distortion result. 

Theorem 2: Given a stationary binary Markov source {Si}, a weight constraint on every binary 
reconstruction w(s) < p, and the single-letter distortion measure d(s,s) = l s >§, the rate distortion 
function is given by 

R(D) =limsup min I(S; S) (16) 

N-*oo P . . EP{5,=i}<pJV N 

S * S 'T,E[<i(S l ,S i )}<DN 

It is straightforward to prove Theorem [2] using the same techniques as developed in [19], with the 
additional weight constraint for the reconstruction. Hence the proof is omitted. Random codes achieve the 
rate distortion function. However, in practice, graphical codes can be designed to approach the optimal 
trade-off. In addition, Theorem [2] continues to hold if the instantaneous input constraint w(s) < p is 
replaced by an average input constraint E[w(S')] < p. 

Note that ( [To*] ) involves minimization over the conditional distribution of the entire power loading 
vector, and hence is not a single-letter characterization of the rate distortion function. Calculating the 
rate distortion function for correlated sources is a hard problem in general. The solution is known only 
in a few special cases pertaining to source alphabets, correlation models and distortion measures [19]. 
Even for a symmetric binary Markov chain and Hamming distortion, the rate distortion function is 
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exactly known only for very small distortion values [20]. In the CSF problem, the reconstruction S is 
a binary hidden Markov process. There is no known close-form expression for the entropy rate of such 
processes, although there exist approximations and numerical results in some cases (see, e.g., [21], [22] 
and references therein). 



Since the exact solution to the optimization problem ( [T6| ) is difficult, we will next find upper and lower 
bounds on the rate for a given distortion. Due to the stationarity of the source, the optimal conditional 
probability Pg\ S is also stationary. Consider any stationary process {(Si,Si)\i = 0, ±1,...} which 



satisfies the constraints in ( fTo*] ), i.e., P{Si = 1} < p and E[d(Si, Si)] < D. Then 

N-i 



lim ^-I(S;S) = lim V" SlSt 1 ) 

i=l 

N-l 

= H(Si\S ) - lim - Y, H(Si\S, SI 1 ) 



(17) 



(18) 



>^(5i|5 )-ff(5i|5i,5 ) (19) 
= I(S 1 ;S 1 \S ) (20) 

where §19\ is because of stationarity and because conditioning decreases the entropy. The rate distortion 
function can thus be lower bounded as 

R{D)> min /(Si; Si | So) (21) 

P Sl|Si,S 



with P§ i \ Si Sq satisfying the constraints in ( Tl6| ). The bounding mutual information depends only on the 
following four probabilities for the given source: q S()Sl = Pg Sg s (0, so, si) with so, si = or 1. Denote 
the lower bound by ii(qoo, Qoi, Qio, Qu) = I (Si] Si|So). This bound can be expressed as a function of 
the crossover probabilities denoted by <5qi = P {Si = l\Si±i = 0} and 5iq = P {Si = 0\Si±i = 1} for 
all i. (The probability of a sub-channel being good is then q = 5qi/(^oi + <^io)) An explicit expression 



for the lower bound is derived in Appendix III 



In terms of these joint probabilities, the fraction of sub-channels that are good, and are correctly 
reported as good is given by (q — got — Qn) and the total fraction of sub-channels reported as good is 
given by 1 — (goo + Qoi + (/io + Qii)- Consequently, an upper bound on the forward achievable rate can 
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be obtained as the solution to the following optimization problem: 

maximize: (C\ - C )(q - qoi - qu) + pCo (22a) 

subject to: 1 - (q 00 + q 01 + q w + q n ) < p (22b) 

kiqoo, qoi, qw, qu) < Rf (22c) 

< qoo, qoi, qw, qn < 1 (22d) 

which can be easily solved numerically. 

In order to find an upper bound on the rate distortion function, we restrict the minimization over P^ s 



in ( fT6[ ) to be a minimization over a finite-dimensional distribution. For example, suppose that conditioned 
on Si and Si+i, the random variable Si is independent of all the remaining random variables in (S, S). By 
stationarity and the Markovian property, the joint distribution of (S, S) is determined by the conditional 
distribution P§ \g Sl - Then it can be shown that, conditioned on Si+i, Si-i and Si, the variable Si 
is also independent of all the remaining random variables in S and S. Consequently, 

H(Si\S, S,- 1 ) > H(Si\S, S -\ Sf +1 ) (23) 
= H(Si\Si-i, Si + i, Si-i, Si). (24) 



Substituting in (18 1, an upper bound for the rate distortion function is obtained as the solution to the 



following optimization problem: 

R(D)< min I(5i; S 2 , S , §i\S ) (25) 



where P§ o \ So Si is also subject to the constraints in ( fT6| ). A simpler but looser bound is obtained if we 
assume that Si is independent of everything else conditioned on Si, for which the mutual information in 
(25 1 becomes I (Si; S2, Si\Sq), and the minimization is over ^5 ig - Again, let the crossover probabilities 



be given by ([7]). The upper bound is a function of these two crossover probabilities and is denoted by 



i„(eo,ei) = I(S\; S2, Si\So). An explicit expression is derived in Appendix III Consequently, a lower 
bound on the forward achievable rate can be obtained by solving an optimization problem similar to that 
in Proposition [T] with the constraint ( |10b| ) replaced by i u (eo,e\) < Rf. 



It is easily seen that if no reconstruction errors are allowed, then both the upper and lower bounds 
reduce to the entropy rate of the channel state process. In other word, if H(S\S) = 0, or equivalently, 

e = ei = 0, q w = qoo = 1 and q 01 = q u = 0, then ^(1,0,1,0) = i«(0,0) = H(Si\S ). Next we 
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provide an example in which the upper bound is not tight. Choosing p = 1 implies that the power 
loading vector S can be chosen all ones and that achieves the capacity with zero feedback rate. However, 
equivalently choosing eo = and ei = 1 gives the upper bound on required feedback rate i u (0, 1) > 0. 

Fig. [3] plots the upper and lower bounds on achievable forward rate per sub-channel versus Rf for 
different values of q with dio = 0.3 and p = 0.3. Consider the plot for q = 0.3. There is a substantial 
gap between the two bounds for small feedback rates. However, as the feedback rate increases, the gap 
closes and the bounds provide an accurate measure of the performance of the VQ scheme with correlated 
sub-channels. Also shown is the performance of the VQ scheme with independent sub-channels. Clearly, 
correlation improves the forward rate by decreasing the feedback requirement. 

Later in Section |Vj the preceding methodology will be utilized to derive bounds on the performance 
of VQ schemes for correlated Rayleigh fading sub-channels. 

IV. Independent Rayleigh fading sub-channels 

The design of a limited-rate CSF strategy for Rayleigh fading sub-channels is again a VQ problem. 
Unfortunately, the exact distortion measure, which corresponds to the capacity-maximizing power loading 
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vectors is difficult to work with [4], [13]— [15]. In order to simplify the problem, we focus on threshold- 
based schemes, which converts the sequence of Rayleigh fading sub-channels to a sequence of "good" 
(sub-channel gain above the threshold) and "bad" (sub-channel gain below the threshold) sub-channels. 
This enables the use of the limited feedback schemes developed for two-state sub-channels. It will be 
shown that, as the number of sub-channels N —> do, the rate achieved with such a scheme grows at the 
same rate as that of water-filling with full channel state information at the transmitter. 

The limited feedback problem here differs from the case of two-state sub-channels studied in Sections 
[IT] and III in two key aspects. First, the threshold which determines the fraction of sub-channels that are 



considered good needs to be optimized. Second, given the total power, the fraction of sub-channels to 
activate also influence the amount of powers in each active sub-channel. 

A. System Model 

Consider a multicarrier channel with N independent and statistically identical, where the channel output 
for the i-th sub-channel is written as 

Yi = HiX t + Zi (26) 

where Hi and Zi are zero-mean circularly symmetric complex Gaussian (CSCG) random variables. 
Without loss of generality, we assume that the channel and noise variance is one, that is, E^-f^l 2 ] = 
E[\Zi\ 2 } = 1. Also, the noise is assumed to be independent across the sub-channels. The N x 1 input vector 
X = [X\, X2, . . . , Xn]^ satisfies the average total signal-to-noise ratio (SNR) constraint E [X'X] < P. 
The channel vector H = [Hi, H2, . . . , H^ is assumed to be known perfectly at the receiver. We assume 
a block fading model so that H remains constant for T channel uses and then changes to an independent 
value. The time dependence is suppressed to simplify notation. 

B. Optimal Threshold Based VQ 

The gain for the i-th sub-channel, \Hi\ 2 , is exponentially distributed with its mean equal to one. Given 
a threshold t > 0, define the N x 1 binary state vector S so that the i-th entry Si = 1 if \Hi\ 2 > t, 
and Si = otherwise. The probability of a sub-channel being "good" is denoted as q = P {Si = 1} = 

P{\Hi\ 2 > t} = e"*. 

Suppose that, on average, the transmitter transmits over or, activates a fraction p of the sub-channels. 
The power is distributed uniformly over the active sub-channels so that each transmission occurs with 
SNR equal to P/{Np). Therefore, the expected capacity of a good and bad sub-channel, respectively, is 
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given by 

C, = , , , , 

q Jt V NpJ 

and 

1 Pr 



Ci = - /°° e- T log ( 1 + ^ ) dr (27) 



Co = r^y o ^ lo H 1 + ^J dr - (28) 

Assume that on average Rf bits per sub-channel per coherence block are available for error-free CSF. 
Also, define B = NRt as the average amount of feedback summed across all sub-channels. Similar to 
the case of two-state sub-channels, the power loading with limited CSF can be seen as a mapping from 
the space of channels state vector S to the space of power loading vectors S, where Si = 1 if the i-th 
sub-channel is activated and §i = otherwise. 

We note a key difference between the VQ problem at hand and the usual stationarity assumption in 
rate distortion theory: The optimal choice of the threshold t here may vary with the total number of 
sub-channels, hence so does the statistics of the binary source denoted by probability q. Nonetheless, the 
following asymptotic bound on achievable distortion (equivalently, forward rates) can be established. 

Proposition 4: Given N parallel Rayleigh fading sub-channels and an average of Rf bits of feedback 
per sub-channel per coherence block, the following statements hold. 

a) The forward rate per sub-channel achieved with a threshold-based feedback scheme is upper 
bounded by C, the maximized objective in the following optimization problem: 

maximize: C = q(l — eo)(Ci — Co) + pCo (29a) 

subject to: H 2 (p) - qH 2 (e ) - (1 - q)H 2 (e 1 ) < R f (29b) 

q{\ - e ) + (1 - q)ei = p (29c) 

0<e ,ei<l (29d) 

with q = e~ t , and where the maximization is over p, t and eo- 

b) There exist fixed-length constant composition feedback codes which achieve the forward rate per 
sub-channel given by ([12]) with sufficiently large N, where C and q are obtained from solving the 
optimization problem in part (a). 

c) There exist variable-length feedback codes which achieve the forward rate per sub-channel given 
by ( p3] > with sufficiently large N, where C, q, eo and ei are obtained by solving the optimization 
problem in part (a). 

Part (a) in Proposition [4] follows directly from the Fano's inequality and is similar to that of Proposi- 
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Fig. 4. Forward rate versus feedback rate for different feedback schemes for N = 500 sub-channels at SNR P = 20 dB. For 
comparison, the water-filling capacity with full channel state information at receiver is 0.385 bits per sub-channel use. 



tion [T] Note that the upper bound C holds for any finite N. The lower bounds in parts (b) and (c) can 
be proved similarly as Propositions [2] and [3j respectively, corresponding to two-state sub-channels. The 
proofs are omitted. Although the lower bounds are stated for large N, more accurate expressions that 
apply to any finite N are derived in Appendices [I] and [IT] 

Interestingly, ( fT2] ) and ( fT3T > imply that the rate at which the lower bounds approach C depends on 
the average number of good sub-channels Nq, which can be much smaller than the total number of 
sub-channels N. The upper and lower bounds on forward rate versus total feedback B = NRf per 
coherence block are shown in Fig. [4] for SNR P = 20 dBj^] Only the lower bound corresponding to 
a variable-length variable composition feedback code is shown. Unless specified otherwise, here and in 
the subsequent numerical results we let N = 500. The plots show that the upper and lower bounds are 

4 The forward rate is measured in bits per sub-channel per channel use whereas B is the total number of feedback bits per 
coherence block. Since typically the coherence block is several hundred channel uses, the results in Fig. [4] correspond to the 
practical regime in which the feedback rate is much smaller than the forward data rate. 

5 Numerical results for a SNR of 27 dB and N = 500 (curves not shown here) show that from the case of no feedback 
to the case of full feedback (water-filling power allocation) the change in capacity is merely 16%. This is consistent with the 
understanding that adaptive power allocation does not help much at high SNRs. Of course, the gains from the various power 
allocation schemes presented here increase with decreasing SNR. 



January 1, 2009 



DRAFT 



19 




100 150 200 250 300 350 400 

Feedback per coherence block (B) in bits 



Fig. 5. Optimal threshold versus feedback for different feedback schemes with SNR = 20 dB and N = 500. 
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Fig. 6. Optimal number of sub-channels that exceed the threshold together with the optimal number of missed opportunities 
and misfires corresponding to the VQ feedback scheme for independent Rayleigh fading sub-channels. Other parameters are 
SNR = 20 dB and N = 500. 
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quite close (within 10%). The corresponding optimal threshold is shown in Fig. [5j The number of good 
sub-channels, number of missed opportunities and the number of misfires as a function of amount of 
feedback are shown in Fig. [6j Interestingly, for B > 70 bits, the threshold does not change significantly 
but the number of missed opportunities and misfires adapt to accommodate additional feedback. Also 
shown in Fig. [4] are curves corresponding to other simpler sub-optimal feedback schemes to be described 
in subsequent sections. 

C. Lossless Coding of Feedback with Threshold Adjustment 

In this section, we consider an alternative feedback scheme using lossless source coding of reduced 



channel states. As in Section IV-B a threshold t is used to control the fraction of sub-channels qualifying 
as good (or bad) in order to meet the limited feedback constraint. All good (or bad) sub-channels are 
then reported to the transmitter using entropy coding. The feedback per coherence block required by this 
scheme is essentially NH2(q) bits with q = e~ l . Note that the feedback constraint H2{q) < Rf can be 
met by choosing either q < 1/2 or q > 1/2. The optimal choice corresponds to the one that maximizes 
the forward rate 

C(t) = N e- T log + dr. (30) 

Fig. [4] plots the forward rate achieved with this scheme versus the feedback per coherence block at 20 
dB. For small to moderate feedback, this scheme performs worse than the optimal VQ scheme described 
in the previous section. For higher feedback rates, the performance of the two schemes converge. The 
optimal threshold versus B for 20 dB is shown in Fig. [5] For small amounts of feedback the optimal 
threshold t is close to zero. Furthermore, Figs. [4] and [5] show that once the feedback crosses a certain 
threshold (B « 170 bits here), the optimal threshold decreases and the capacity increases with the 
amount of feedback since more good sub-channels can be reported with additional feedback. As B 
increases further, the threshold decreases monotonically to an asymptotic value, and the capacity reaches 
its maximum value at around B = B max « 440 bits per coherence block. More feedback beyond this 
value cannot be utilized by the threshold-based scheme]^] 

The asymptotic forward rate versus feedback performance of this scheme (assuming that the amount of 
feedback and number of sub-channels go to infinity) is discussed in [2] and hence the details are omitted 

6 The additional bits could be used to increase the number of quantization levels for the power on active sub-channels. The 
corresponding increase in rate, relative to the one-bit quantization assumed here, is typically quite small [2]. 
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here. A more refined analysis characterizing the asymptotic growth rate of the maximum achievable 



forward rate and B max as a function of N is given in Section IV-E 



D. Group-Based Power Loading 

Another feedback scheme for comparison is based on sub-channel groups as well as threshold-based 
state reduction, which is an enhancement of the scheme discussed in Section |IV-C| Such a scheme was 
originally proposed in [23] for reducing feedback in downlink orthogonal frequency division multiple 
access (OFDMA) systems. The idea is to divide the sub-channels into G nonoverlapping groups, each 
containing m = N/G consecutive sub-channels. Given a threshold t, the receiver informs the transmitter 
to use only those group in which all sub-channel gains exceed t. The probability of this event is e~ mt , 
so that for large N the average amount of feedback required per coherence block for this scheme can be 
compressed to the entropy rate GH2(e~ mt ), which should not exceed the feedback constraint B. 

1 ) Asymptotic Rate Versus Feedback: Assuming that the transmitter codes across coherence blocks in 
frequency and time, the achievable rate is given by the average mutual information (ergodic capacity)Q 

C(m,t) = A^°°e'- T log(\ + ^dT. (31) 



Note that the rate pi) does not depend on the coherence block length T. We wish to choose the feedback 
parameters m and t to maximize C(m,t) subject to the feedback constraint GH2(q) < B. Although it 
appears to be difficult to obtain an analytical characterization of the solution for arbitrary N, the solution 
for large N and B can be characterized as follows. 

Proposition 5: For fixed signal-to-noise ratio P, as N —> oo and that B — ► oo with N, the capacity 
I]) optimized over t and m is given b}j^] 

^ log(l + O + o(l) if B<B 1 

( " ■ < sfy lQ g f 1 + ^ log (^)) + 0(1) if Bl < B < B max (32) 



P (log N - (1 + 7/2) log log N) + 0(1) if B > B, 



max 



where B t = ^(logiV) 2 " 7 ' 1 , B max = P(log iV) 2+??2 , and u* w 3.92 is the positive solution to log(l+n) = 
2u/(l + u). In addition, r]i G (0,2) and 772 € (0, 1) are functions of N such that 771 — > and 772 — ► 1 as 

N -> 00. 

7 A somewhat more conservative rate is obtained by selecting the code rate assuming that all active sub-channel gains 
\h g i\ 2 = t [23]. This does not change the asymptotic results in Section 



IV-D 



8 As iV — > 00 and B — » 00 with N, o(l) is vanishingly small and O(l) is bounded by a finite constant. 
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A sketch of the proof is given in Appendix IV The appendix also provides optimal threshold and 



group sizes as functions of B and N, and expressions for rji and r\i- 

The capacity expressions in Proposition [5] are good approximations when N is a few hundred and B is a 
few tens of bits. In fact, given fixed large enough N, the results can be understood as follows: In the range 
of relative small to moderate amount of feedback B (specifically, P/u* <C B < (P/u*)(logN) 2 ~ Vl ) the 
capacity is proportional to yB. If B is greater than (P/u*) (log N) 2 ~ Vl , the capacity increases, albeit 
comparatively slower, with B. Finally, the forward rate does not increase when B exceeds P(log N) 2+V2 . 
The corresponding maximum achievable rate is given by ( [32] ), which is roughly PlogN. However, the 
negative second-order (log log) term in ( [32] ) can be substantial, as will be seen in the subsequent numerical 



examples. A specific numerical example will be provided in Section IV-D.3 



2) Optimal Threshold and Group Size: Expressions for optimal values of the threshold t and group 



size m as a function of N and B are derived in Appendix IV Here we outline the main characteristics 
of the optimized parameters. 

As expected, when the feedback is in the small to moderate range, B < (P/u*) (log N) 2 ~ Vl , the 
optimal group size m* > 1. In this range, the threshold increases with feedback and is proportional 



to VS. The exact expressions for m and t are given by d76j) and (77). For B > (P/u*) (log N) 



the optimal group size m* = 1. Hence, for this range of feedback, the group-based scheme reduces to 



the previous scheme described in Section IV-C Interestingly, here the threshold is a decreasing function 



of B and is given by (|8T]). As the feedback increases, decreasing the threshold beyond a certain value 



decreases the capacity. It is shown in Appendix IV that the optimal threshold corresponds to feedback 



Bmax = f (log N) 2+V2 and is slightly smaller than logN. The exact expression is given by ([33]). 

3) Numerical Examples: The preceding asymptotic results are illustrated in Fig. [7] The optimized 
group size, threshold, and corresponding capacity are plotted as function of the amount of feedback. 
These results are obtained by optimizing the original capacity expression ([37]) subject to the feedback 
constraint)^] Fig. [7] also shows the asymptotic analytical results. (The values plotted here are refined 
versions of the expressions presented in Proposition [5] and are derived in Appendix IV ) The plot shows 



that the asymptotic values are close to the values obtained from numerical optimization. As predicted by 
Proposition [5] the plot shows that as B increases from zero, the group size decreases, and the threshold 
increases. However, once the group size crosses one (when B is about 40 bits/coherence block), the 
threshold decreases with B and the capacity increases relatively slowly. Finally, for large amounts of 

9 Of course, in practice m can assume only positive integer values, as opposed to the real values obtained from the optimization, 
which are shown in Fig. |jj 
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Fig. 7. Comparison of numerically optimized values and asymptotic values versus feedback per coherence block for N = 500 
and P — 5 dB SNR. For comparison, the water-filling capacity is 14.93 bits per channel use. 



feedback (say, greater than 135 bits/coherence block for this example) the capacity and threshold saturate, 



and increasing the feedback further does not improve performance. Referring to (f79|)-(|83]), these values 
correspond to j]\ ps 772 ~ 0.25. 

4 ) Performance comparison: The optimized forward rate pi) versus B for the group-based scheme is 
shown in Fig. |4j At 20 dB SNR, grouping sub-channels can provide about 15% gain over threshold 
adjustment alone for small to moderate feedback rates. As the feedback increases, the two scheme 
become the same (since the group size converges to m = 1 at around B « 340 bits). Recall that 
the advantage of the group-based scheme over threshold adjustment alone is limited to the feedback 
range of B < (P/n*)(log N) 2 ~ ni . The optimal threshold for the group-based scheme is shown in Fig. [5] 
Observe that the optimal thresholds with and without grouping converge for B > 340. The thresholds 
behave in strikingly different manners for the two schemes when feedback is smaller. 

It is also seen in Fig. [4] that the VQ scheme performs substantially better than both LSC schemes with 
and without grouping for small to moderate amounts of feedback. Namely, VQ saves between 100 and 
150 feedback bits per coherence block over a wide range of target forward rates. 
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E. Growth in achievable forward rate 



In this section we highlight several common features of all three schemes discussed in Section IV With 
sufficient amount of feedback, all three schemes correspond to the optimal "on-off" power allocation in 
which the power is uniformly spread over active channels [2], [24]. The optimal threshold is given by 

t* = [log N-{l + V2 ) log log N - log P] + o(l) (33) 

and the corresponding capacity is given by d32l) for B > B max (see Appendix [TV]). 



From ( [83] ) in Appendix IV as N — > oo, 772 — > 1, so that ( |32] > states that 0(log 3 V) feedback can 
achieve the optimal 0(log N) growth in achievable rate. This result has been previously presented in [2], 
which considers the same threshold-based feedback scheme considered here without grouping (m = 1). 
Also, the numerical examples given in the previous section show that for reasonable values of N, the 
amount of feedback needed to achieve the 0(P log N) forward rate may be closer to Plog 2 N than to 
Plog 3 N. 

At the other extreme of small feedback B — > 0, the three schemes also perform similarly and the 
forward rate converges to the SNR P (see Fig. [4]). This limit is the ergodic capacity of a Rayleigh fading 
channel without feedback when the bandwidth becomes large, i.e., N — ► 00. For the VQ scheme, as 
B — > 0, the optimal parameters converge to either t — > 0, eo — ► 0, or t — > 00, t\ — > 1, both of which 
imply the same transmission strategy. Clearly, for threshold adjustment without grouping, the optimal 
threshold t* — > as feedback becomes small. Finally, although Proposition [5] does not cover the case of 
finite B, it is easy to show that for the group-based scheme, as the amount of feedback B — > 0, we have 
m —>■ N, t —> and the achievable rate C — > P. 

V. Correlated Rayleigh fading sub-channels 

In this section, we remove the assumption that the Rayleigh fading sub-channels are independent. 
Suppose the sequence of complex sub-channel coefficients is a Gauss-Markov process generated by the 
following first-order autoregressive model, 

Hi = aH i -x + y/\-a i W u i = 2,...,N, (34) 

where Wi are i.i.d. zero-mean CSCG random variables with unit variance, and a G (0, 1) represents 
the correlation between the sub-channels. Each sub-channel gain \H{\ 2 is exponentially distributed. The 
sequence of sub-channel gains {|Pj| 2 } is a Markov process with joint second order probability density 
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function given by 

9{x,y) = - o e M l 2 ( 35 ) 

1 — or \ 1 — a z J 

where is the modified Bessel function of the first kind and zero-order. 

Again, given a threshold t > 0, the state vector S is defined such that Si = 1 if \Hi\ 2 > t and 5j = 
otherwise. Also, q = P {Si = 1} = e~*. The sequence {S^} is a hidden Markov process rather than i.i.d. 
Nonetheless, for a fixed p and t, the rate-distortion trade-off is still given by ( fT6] ). However, obtaining 



upper and lower bounds on the required feedback rate ( |T6| ), analogous to the two state Markov model of 
Section [TTTJ seems to be difficult due to the hidden Markov structure of S. 

We proceed by assuming that the receiver approximates the sequence {Si} as a first-order Markov 
chain. Ignoring the higher order correlation in S results in a larger feedback requirement (or equivalently, 
an upper bound on R(D) in ([To])). Using this upper bound as the feedback rate then gives an achievable 
forward rate (a lower bound on capacity). The transition probabilities for the first-order Markov model 
are, 

5 W = Pr{5, = 0\S l±1 = 1} = 1 - - / / g(x, y)dxdy, (36) 

Q Jt Jt 

6 01 = Pr{^ = l\S i±1 = 0} = i^- . (37) 



With the first-order Markov approximation, the problem reduces to the one discussed in Section III 



Consequently, an achievable forward rate can be computed by solving the optimization problem (29 1 



over p, t, eo and ei with q = e * and the constraint ( |10b| ) replaced by i u (eo, ei) < Rf. An expression for 



*u(eO)Ci) is given in Appendix III 



Fig. [8] plots this rate versus feedback per coherence block B = NRf for a = 0.6 and 20 dB SNR. 
Also shown in Fig. [8] is the forward rate achieved with the simpler LSC feedback scheme that encodes 
the state vector S and controls the amount of feedback by adjusting the threshold without grouping. 
Allowing no errors in the reconstructed channel state vector means that the minimum required feedback 
rate is H(S), which is upper bounded by H(Si\Sq) = gi^^io) + (1 — 9)^2(^01)- This feedback scheme 
was considered for correlated sub-channels in [24] . In Fig. [8j VQ performs substantially better at small 
to moderate feedback rates. The forward rate values were also computed for a = 0.9 (not shown in 
Fig. [8]), but there the comparison is inconclusive, since the lower bound on forward rate achieved with 
VQ is very close to the forward rate achieved with the LSC scheme. 

The behavior of the optimal threshold and crossover probabilities eo, ei is the same as for independent 
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Fig. 8. Achievable forward rate versus the feedback rate for a VQ scheme with correlated fading sub-channels. Also shown 
is the forward rate achieved with the threshold adjustment feedback scheme without grouping. Other parameters are N — 500, 
P = 20 dB SNR. For comparison, the water-filling capacity with full channel state information at receiver is 0.385 bits per 
sub-channel use (the same as in Fig. |4). 



sub-channels and hence is not shown here. Similar to the independent sub-channels case, we can again 
apply the group-based scheme to correlated sub-channels. Performance evaluation appears to be difficult, 
but the performance of the group-based scheme should be inferior to that of the VQ scheme and at least 
as good than without grouping. 



VI. Conclusions 

We have studied limited feedback of channel states for multicarrier systems with two-state and Rayleigh 
sub-channels. The asymptotic performance has been characterized, using rate distortion theory, along 
with bounds on the performance loss with a finite number of sub-channels. For Rayleigh channels, the 
threshold-based VQ scheme shows a substantial improvement over lossless coding of reduced channel 
states based on thresholding, especially at low to moderate SNRs. Of course, this benefit comes at the 
price of higher source coding complexity (e.g., using graphical codes). 

Our results have assumed perfect channel knowledge at the receiver and have neglected the feedback 
overhead. In practice, the combined overhead for channel estimation and feedback compromises the 
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benefits of feedback. This becomes more important as the channel coherence time decreases. A model, 
which accounts for feedback overhead in a multicarrier time-division duplex system is presented in [25]. 



There the overhead is optimized, assuming the lossless feedback code presented in Section IV-D Training 
and feedback overhead in the context of beamforming has been studied in [26], [27]. That approach may 
also be appropriate for the multicarrier scenario considered here. 

We have also assumed a noiseless feedback link. For the schemes considered here a noisy feedback link 
requires additional overhead in the form of channel coding or higher transmit power. Other alternatives 
include analog CSF (e.g., see [28], [29]), which gives noisy estimates at the transmitter, and sending a 
pilot signal from receiver to transmitter at the beginning of each coherence block [30] (assuming channel 
reciprocity applies). Comparative advantages and disadvantages of these schemes remain to be studied. 

Finally, the results presented here can conceivably be extended to more elaborate system and channel 
models, such as continuous fading, instead of block fading (e.g., see [31], [32]), and Multiple-Input 
Multiple-Output (MIMO) OFDM. Such systems typically operate at low SNRs per antenna (or coefficient) 
and hence can benefit substantially from adaptive power loading [33], [34]. Limited CSF schemes for 
downlink and uplink OFDMA are presented in [23], [35], [36] and the references therein. (See also the 
comprehensive survey of the limited feedback literature in [37].) Extensions of the VQ scheme presented 
here to those settings is also left for future work. 

Appendix I 
Proof of Proposition [2] 

First we introduce some notation, then describe the construction of a fixed-length, constant-composition 
feedback code, and finally give the performance bounds as a function of 7Y. 

Let V s (l) represent the empirical probability of ones in a length N binary vector channel state s. 
Namely, V g (l) = q implies that Yli=i s i = Q-N. A similar definition holds for ^(l). Furthermore, for 
any pair of vectors (s,s), ^^(Oll) represents the empirical probability of zeroes in s at the positions 
where s has ones. In other words, Vs\ s (0\l) = e$ implies that J2iLi ^{s,=o,s t =i} = ^qPb(X)N. Let Ts ]q 
represent the set of all constant composition length N channel state vectors s which have qN number 
of ones, that is, Ts- q = {s : V s (l) = q}. Similarly, define the set of constant composition power loading 
vectors T^. p = {s : V- S (l) = p}. 

Let the feedback codebooh be a subset T^ s C Tg. p - The amount of feedback required per coherence 
block is therefore log 2 \7~~ s \ s \ bits. A vector s G T~ s \ s is said to cover s G Ts- q if ^^(Oll) = eo and 
^s]s(l|0) = e i- Note that depending on the size of the codebook, all the vectors in Ts- q might not be 
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covered. The feedback occurs as follows: Every channel realization vector is matched to the closest (in 
hamming distance) typical vector s G Ts;q- The receiver then looks up a s in T^ s which covers s feeds 
back the index of s. If there is no such s then a random index is fed back. Therefore, the elements of 
the codebook T^ s should be chosen carefully to minimize the distortion or, equivalently, maximize the 
forward rate. Next, along the lines of [11], we will evaluate the average performance assuming that the 
subset T^ s is chosen at random. Then by usual argument we can claim that there must exist a structured 
fixed-length constant composition codebook that does at least as well. 

Consider the performance of the codebook as a function of N. The Type covering lemma [38] suggests 
that we can find a codebook with size |Tg |s | = g(N)2 i( - t0 ^ N , where g(N) is a polynomial, such that 
every vector in Ts,q is covered. Here, we use random coding arguments to get an estimate of g(N). 
Suppose T^ s consists of M vectors that are randomly and independentl} 10 drawn from T§. p . 

Given a vector s G Ts- q , probability that it will be not be covered by the M randomly chosen vectors 
is given by 

Pn = [l-Pc] M <e~ M ^ (38) 

where 



/ qN W (l-q)N \ 
\t qN) \e 1 (l-q)NJ 

KpNJ 

Further using Robbin's approximation [38], [39] for the factorial 



Pc= • (39) 



V^n n+ h~ n+ ^^ <nl < V^n n+ h- n+ ^, n > 1, (40) 
it is straightforward to show that 



P, 



> N~ 1 / 2 2~ I<2 2 _i ( e °' ei ) Ar (41) 



where the mutual information i(eo, ei) is given by ([8]) and Ki = log 2 (v / 27re 5//12 ) — \ log 2 (p(l — p)) is a 
constant. Now choosing M = (log N)N l l 2 2 K2 2 i( ~ t0 ' e ^ N and combining ([38]) and ((4T) gives p n < 1/N. 



Therefore, with this codebook the feedback rate per sub-channel per coherence block given by 

lQ g2 \ r s\s\ log 2 M ./ \ , (log 2 jV + 2if 2 + 21og 2 (logAQ) 

N = -J^~ = * (£0 ' £l) + 2N (42) 

converges to the mutual information i(eo,ei). 
10 We have ( N ) choices in each drawing. 
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Next we bound the average forward rate achieved by this codebook. For this we need to account for 
the variations in the channel gain vector S. Consider the set of channel gain vectors with fraction of ones 
in the range (q(l - e), q(l + e)), that is, say T§. q = {s : V S {1) G (q(l - e),q(l + e))}. Using Chernoff's 
inequality [40], the following is easily seen 



Pr{ J^-giV >qeN\ 



> qeN > < p u = 2 exp 



qNe" 



< e < 2(1 



(43) 



which in turn implies that P?{1~g. q } > 1 — p u - Using the definition of e above, the ergodic capacity is 
lower bounded as 



C fixed > <?(1 - e - e)(Ci - C ) (1 - p u ) (1 - Pn) + pC 

1 



>C-C [e + Pu + 



N 



(44) 
(45) 



where C = q(l — eo)(Ci — Co) + pCo- The loss factors 1 — p n and 1 — p u in ( |44] > account for the fact 
that channel gain vectors might not be covered or, might not be in the set Tg , respectively. Further, 
choosing e = 



21og(JVg) 
Nq 



gives a tight lower bound as 



/ V21og(iVg)+2 1 
C fixed >C-C\ + - 



(46) 



In summary, since (|46J» bounds the average performance of a randomly generated codebook with M 
codewords, there must exist a codebook of size M which performs at least as good as this lower bound. 

Lastly, we consider the convergence rate to the upper bound given by Proposition [TJ Note that required 
feedback rate ((42]) is more than mutual information i(eo, ei). If we do not wish to allow the feedback to 
exceed i(eo, ei) bits per sub-channel, an additional distortion of 5 can be introduced so that eo and ei are 
replaced by eo + 5 and €\ + 5', respectively, where 5' = q5/(l — q) (so that the input power constraint 
( |10c| ) is satisfied). Using Taylor series expansion, for a small enough 5 we have 



2q[H' 2 (e ) + H^S < i(e + 6,6! + 5') - i(e , ei) < --q[H' 2 (e ) + 



(47) 



Substituting eo + 5 and e\ + 5' for eo and e\ in (42) and using (47 1, for large enough N and small enough 
5 we have 

lofra |Tfc,J i log 2 iV 



< i(e , Cl ) - ^[^(eo) + H'ie^d + 



N 



(48) 



Now choosing 5 = K\ °% where = -mr, — n u,i — vr . gives that the number of feedback bits per 
sub-channels log2 j^" s|s ^ < i(eo,ei). The additional distortion of 5 will result in a loss in capacity which 
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can be quantified by replacing eo by eo + 5 in (44 1 which gives a lower bound on forward rate as 



r >r r / y / 21og(jVg) + 2 1 log 2 iv\ 

C flxed >C — C\ -j= + - + Kt — ^ I (49) 



which for large enough N can be further lower bounded as ( 12 1. 



Appendix II 
Proof of Proposition [3] 

We again resort to the random coding techniques as in [12], assuming that corresponding to each 
channel state vector s, the power loading vector s is produced with Bernoulli-p distribution. A randomly 
generated codeword s is admitted only if it satisfies the empirical probabilities Vs\ s (0\l) = eo and 
^s|s(l|0) = e i> where eo and t\ are fixed and q(l — eo) + (1 — q)e\ = p- This will ensure that averaged 
over all the state vectors, the number of sub-channels used for transmission are given by pN and average 
number of unused good sub-channels are qe^N. Next, we shall find the expected variable-length encoder 
rate, averaged over this ensemble of codes, and then, by usual argument, we can assert that there must 
exist at least one set of {s} that gives the performance as good as the average. 

Let L represent the random variable denoting the fraction of ones in the state vector S. Clearly, 
H(L) < log 2 (-/V + 1). If a variable-length feedback codebook with average rate of Rf bits per sub- 
channel per coherence block is used, then we can write NRf < H(S) + 1. The rate can be further 
upper bounded as NRf < H(S,L) + 1 = H(S\L) + H(L) + 1. Averaging over the random code book 
selection we get that, 

NRf < Eg[H(S\L)} + H{L) + 1. (50) 

Corresponding to a channel state vector with L = I, define qi as the probability that a randomly drawn 
codeword s is admissible. Then we have, 



IN \ ( (1 - l)N \ n(l)N (1 _ N(l-n(Q)JV 

! lNj\e 1 (l-l)Nj P [i P) 



ii = [ eolN )\ I1A ,]/>""'-Mi.-/>>' j -"" n -\ (5D 

where n(l) = (1 — eo)/ + ei(l — I). Further, it is argued in [12] that given the geometric distribution 

p{k\l) = qi(l — qi)^ 1 we have 

oo 

E- s [H{S\l)] = -5>( fe l01og 2 POI0 (52) 

k=l 

< ~ log 2 Qi + log 2 e. (53) 
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Combining (50 1 and (53 1 and using the fact that H(L) < log 2 (iV + 1) we have, 



A? 



NR f < - ^(log 2 qi ) Pr{L = 1} + log 2 e + log 2 (iV + 1) + 1. 



(54) 



1=0 



(55) 



Further, applying Robbin's approximation ( |40| ) to ( |5T| ) we have 

-log 2 («) < -lNH 2 (eo) - (1 - l)NH 2 (ei) 

+n(l)Nlog 2 (p) + (1 - n(i))JVlog 2 (l -p) + log 2 N + K 3 , 

where = | log 2 [e 8 (27r) 2 eoei(l — eo)(l — ei)]. Substituting ( [55] ) into ( [54] ) and using the fact that 

E[n{l)\ = p and E[l] = q, we get, 



1 



Rf < i(eo, ei) + -(log 2 e + JT 3 + log 2 N + log 2 (iV + 1) + 1) . 



(56) 



Since this rate exceeds z'(eo,ei), similar to Appendix [TJ we can introduce additional distortion so that eo 



and ei are replaced by eo + 5 and e\ + Therefore, using ( |47j ), for large enough A" and small enough 
5, ([56]) yields 

1 . 3 lnp-„ N 

(57) 



Finally, choosing 5 = g [ g / ( ° )+^( ei )]jv § ives ^/ - *( e Q) e l) and capacity as ([13). 



Appendix III 
Derivation of ^(g o, goi, qw, qix) and z u (e , ei) 
The lower bound can be explicitly computed as follows 



iz(<7oo,<7oi,<7io,<?n) = /(Si;,Si|So) 

= i?(5i|5o) — So)- 



(58) 
(59) 



Each entropy term is further computed as 



i/(Si|S ) = (1 - <?)#2 ( 1 _ ~ ) + qH 2 



and 



H{Si\S u S ) = (l-q)(l-5 01 )H 2 



qoo 



(1 - - 5 l) 



+ <7<5i -ff2 



gio 

9^10 



+ (1 -^*(T^J +ff(1 -* fl)ira V,(i-fto). 



(60) 



(61) 
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Next we compute the upper bound /2(eo, ei). Recall that, in order to arrive at the upper bound, 
assumed that conditioned on Si, Si is independent of all other elements in S, thus 

i u {eo, ei) = I (Si; S 2 , Si\S Q ) 

= H(S 2 ,S 1 \So)-H(S 2 ,Si\S 1 ,S ) 

= H(S 2 \S ) + H(Si\S ,S 2 ) - - H(S 2 |Si). 

Each entropy term can be further computed as 

H(S 2 \S ) = qH 2 ((1 - r5i ) 2 + S 10 6 01 ) + (1 - q)H 2 (<5 i(l - <5i ) + (1 - <5 i)<5 i) , 

(Si|Si) = 9j ff 2 (eo) + (1 " q)H 2 (e 1 ), 
H(S 2 \Si) = qH 2 (5 w ) + (1 - q)H 2 (S 01 ) 

and 

fl"(5i|S , 5 2 ) = ((1 - 5oi) 2 (l - q) + <S?o9) #2 («>oo) 

+ 2 («5 i(l - <5 i)(l -q) + <5i (l - M<?) #2 (woi) 
+ (<5g 1 (l-g) + (l-«5i ) 2 g) ff 2 (t«ii), 
where the probabilities in the argument of binary entropy functions are defined as, 

w.s s 2 = - P 5 1 |5 ,s 2 (°l s o,S2), s ,s 2 = or 1. 

We note that w\o = woi and the probabilities can be explicitly computed as 

(l- €l )(l-5 01 ) 2 (l-q)+e 5 2 10 q 



woo 



WQl 



(l-5 i) 2 (l-q)+5l q ' 
(1 - ei)£oi(l - 5 01 )(l -q) + e 6 w (l - 5 10 )q 
6 01 (l-Soi)(l-q) + 6 1 o(l-5 10 )q ~ ' 

(1- ei)<&(l -g) + e (l-M 2 g 
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Appendix IV 
Proof Sketch of Proposition [5] 



Consider the maximization of the capacity C(m,t) in ( pT| ) over the group size m and threshold t 
subject to GH(q) < B. The capacity can be bounded as 

Nq log + ^ < C(m, t) < Nq log + • (73) 



The lower bound is simply by observing that the logarithm term in plj ) takes on its minimum value at 
the boundary r = t, whereas the exponential term integrates to 1. The upper bound can be shown using 
the fact that / °° e~ x log(x + a) dr < log(l + a) for all a > 0. Clearly, the maximum value of C(m, t) 
subject to GH(q) < B is no greater than the maximum value of the upper bound of C{m, t) in ( f73] > 
subject to the same constraint. Next we obtain a solution to the latter optimization problem and show 
that for large B and N, it provides a good approximation to the solution of the original optimization 
problem. Without loss of generality, substituting w = Nq the optimization problem can be written as 

max C = u>log ( 1 + — + — ) , subject to: wt < B. (74) 



w,t \ w 

Assuming that the feedback constraint is tight, i.e., wt = B, the optimum w must satisfy 

(1 + u+ P/w) log(l + u + P/w) = (1 + 2u + P/w), (75) 



where u = PB/w . A closed-form solution to (75) seems difficult, however, insight can be obtained 



by assuming that N, B are large. In addition, we assume that the optimal w increases with B such that 
u S> P/w or, equivalently, as B — > oo, w/B — > 0. We will see later that this is indeed true. Therefore, 
observing that the P/w terms in (75 1 are small compared to u, the optimal w* = <J ^ + o(l), where 
o(l) is vanishingly small as B — > oo, and u* is the solution to (1 + u*) log(l + u*) = (1 + 2u*). Since 
we assume that wt = B, solving ( f74| ) gives 



u*B 

-p- + o(l) (76) 



m ^^ lo "wm +0{l) (77) 



C = y^log(l + u*) + o(l). (78) 
The parameter values satisfy the original feedback constraint GH(q) = B and in fact the lower bound 



in ( |73| ) also behaves as ( |78] ) (although the value associated with the o(l) term change). This implies that 
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the optimal parameters that maximize the capacity C(m,t) satisfy ([76]) and (77), and the capacity C* is 
approximated by ( f78] ) to within a vanishingly small term. 

Note that ( f77] ) implies that for m* > 1 we should have feedback in the range B < ^(log N) 2 ~ Vl for 
large N where 771 £ (0, 2) is the solution to 

P 



log iV — log 



if 



(logiV) 1 - 



(79) 



It is easy to see that rji — ► as iV — > 00. Next we solve for the optimal parameters when B > 



^(logiV) 2 7,1 . Again we first solve the upper bound maximization problem ( |74| ) with 



m 



equivalently q = e 1 . Namely 



max C = Ne~ l log ( 1 + A . , 
t \ Ne~ l 



1 or, 



(80) 



Assuming that the feedback constraint is tight, i.e., Nte 1 
bound on capacity 

NloeiV 



B, we get the optimal threshold and upper 



t* = log 
C = 



B 



+ o(l) 



B logfl + ^log^l+Od) 



IorN 



B 



B 



(81) 
(82) 



Again, it can be checked that with appropriate adjustments to the o(l) and O(l) terms in (8J_l and 
82] ), respectively, the threshold ( |8T] ) satisfies the original feedback constraint NH(e' t ) 



B and the 

lower bound in ( [73] ) also behaves as ([82]). This implies that the threshold, which maximizes the capacity 
C in the feedback range B > ^?(log N) 2 ~ ri1 satisfies ( [81] ), m* = 1 and the capacity C* is also given by 



(82i. 



Furthermore, note that as B increases, the threshold ( [81) decreases. However, decreasing the threshold 
beyond a certain optimal value decreases the capacity upper bound in ([80]). The optimum value of the 



threshold that maximizes the upper bound in (J80J) is given by ( |33) and the corresponding upper bound 
is given in (|32~|), corresponding to B > B max , where 7/2 G (0, 1) is the solution to 



logN-log[P(logiV) 1+r ' 2 ] = (logA0 {1+r?2)/2 . 



(83) 



Clearly, 772 — » 1 as N — » 00. Substituting m = 1 and p3\ into the lower bound in ( |73| ) gives that 
the lower bound and hence the capacity also behave as in ([32]). Therefore, the optimal threshold that 
maximizes the capacity is given by ( [33] ) with adjusted o(l) term. Corresponding to ( [33] ), the maximum 
required feedback is given by B max = NH(e~ t ) = P(logN) 2+r > 2 + o(log 2 N). 
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